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Background: The carbon stored in vegetation varies across tropical landscapes due to a complex mix of climatic 
and edaphic variables, as well as direct human interventions such as deforestation and forest degradation. Mapping 
and monitoring this variation is essential if policy developments such as REDD+ (Reducing Emissions from 
Deforestation and Forest Degradation) are to be known to have succeeded or failed. 

Results: We produce a map of carbon storage across the watershed of the Tanzanian Eastern Arc Mountains (33.9 
million ha) using 1,61 1 forest inventory plots, and correlations with associated climate, soil and disturbance data. As 
expected, tropical forest stores more carbon per hectare (182 Mg C ha" 1 ) than woody savanna (51 Mg C ha" 1 ). However, 
woody savanna is the largest aggregate carbon store, with 0.49 Pg C over 9.6 million ha. We estimate the whole 
landscape stores 1 .3 Pg C, significantly higher than most previous estimates for the region. The 95% Confidence 
Interval for this method (0.9 to 3.2 Pg C) is larger than simpler look-up table methods (1.5 to 1.6 Pg C), suggesting 
simpler methods may underestimate uncertainty. Using a small number of inventory plots with two censuses (n = 43) 
to assess changes in carbon storage, and applying the same mapping procedures, we found that carbon storage in the 
tree-dominated ecosystems has decreased, though not significantly, at a mean rate of 1.47 Mg C ha" 1 yr" 1 (c. 2% of the 
stocks of carbon per year). 

Conclusions: The most influential variables on carbon storage in the region are anthropogenic, particularly historical 
logging, as noted by the largest coefficient of explanatory variable on the response variable. Of the non-anthropogenic 
factors, a negative correlation with air temperature and a positive correlation with water availability dominate, having 
smaller p-values than historical logging but also smaller influence. High carbon storage is typically found far from the 
commercial capital, in locations with a low monthly temperature range, without a strong dry season, and in areas 
that have not suffered from historical logging. The results imply that policy interventions could retain carbon 
stored in vegetation and likely successfully slow or reverse carbon emissions. 
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Background 

Tropical forests are globally significant ecosystems; ac- 
counting for -50% of global forest area [1], storing- 
45% of all carbon in terrestrial vegetation [2-4], main- 
taining high biodiversity [5], and providing ecosystem 
services, such as timber, non-timber forest products [6], 
and climate change mitigation [7,8]. However, within the 
last few decades, vast areas of tropical forests have been 
converted to other land-uses or degraded. For example, 
between 1990 and 1997, 4.4-7.2 million hectares of 
humid tropical forest were converted each year and an 
additional 1.6-3.0 million hectares of forest were visibly 
degraded [9]. This process increased in the early 2000s, 
with an estimated 5.1-5.7 million hectares of humid 
tropical forest (and 3.5-4.7 million hectares of dry trop- 
ical forest) deforested per year between 2000 and 2005 
[10]. The gradual and sustained reduction in forest qual- 
ity and quantity has resulted in substantial emissions of 
C0 2 [11]. Globally, deforestation and forest degradation 
accounted for 6-20% of anthropogenic GHG emissions 
in the 1990s and early 2000s [12-14]. Tropical regions 
make a substantial contribution to this, emitting 0.7-1.5 
Pg C yr" 1 between 1990 and 1999 [9,15-17] and 0.71.5 
Pg C yr" 1 between 2000 and 2007 [13,16-18]. These pro- 
cesses also impact the future potential of forests to re- 
move carbon from the atmosphere [7,19,20]. 

Recently, attempts to mitigate increasing anthropo- 
genic C0 2 emissions through reducing emissions from 
degradation and deforestation (REDD+) have been insti- 
gated [21]. The REDD + programme is aimed at contrib- 
uting to a reduction in greenhouse emissions whilst 
providing economic incentives for better management 
and protection of forests. This policy has been widely 
welcomed and may provide a financial incentive to sig- 
nificantly reduce carbon emissions [22,23], although the 
equity and justice issues surrounding the impact on local 
livelihoods are actively debated [24,25]. Key technical is- 
sues for the successful implementation of REDD + in- 
clude (but are not limited to) the accuracy of monitoring 
systems, preventing leakage and establishing accurate 
historical baselines. Thus, the success of REDD+, in part, 
rests on robust scientific information on the magnitude 
and extent of carbon storage in tropical regions and how 
it changes over time [26]. 

The Intergovernmental Panel on Climate Change (IPCC) 
provide a three "Tier" system through which carbon stocks 
and emissions can be reported, each with a different level 
of methodological complexity and accuracy. Tier 1 is the 
simplest method, using global default values obtained from 
the IPCC literature [27,28]. The intermediate Tier 2 level 
improves on Tier 1 by using country specific data. Tier 
3 is the most rigorous approach, using local forest in- 
ventory data, focusing on the direct measurement of 
trees, repeated over a time series [27-29]. Here we 



develop a Tier 3 methodology for the Eastern Arc 
Mountains (EAM) watershed area. 

The estimates become progressively more robust from 
Tier 1 to 3 due to changes in two main systematic errors 
[29]. The first, completeness, refers to the number of 
IPCC carbon pools that are included, with studies in- 
cluding all five pools (aboveground live, litter, coarse 
wood debris [CWD], belowground and soil carbon) con- 
sidered complete. The second, representativeness, de- 
rives from the substantial natural variability in the 
carbon stored across landscapes, even within a biome or 
country [30]. The aboveground biomass of a forest 
within a landscape may differ considerably from global 
default (Tier 1) values or even from country-specific 
(Tier 2) values. For example, in the Peruvian Amazon, 
data from the Los Amigos Conservation Concession [31] 
were shown not to be representative of forests nation- 
ally. Nearby forests situated to the north and south of 
this local study are estimated to contain 20-35% less car- 
bon per unit area [32], suggesting that Los Amigos Con- 
servation Concession is an area of locally high biomass. 
Since Tier 3 methods account for variation observed 
within biomes and countries, the representativeness of 
the carbon estimates is higher than those associated with 
Tier 1 and 2 methodologies [32,33]. 

However, Tier 3 methods are more expensive [34,35] 
and some nations may lack the capacity to adopt such 
methods [36]. Whilst, in some cases, the capability to 
apply Tier 3 guidelines is being rapidly developed, multi- 
temporal inventory data and data on historical carbon 
stock changes can take several decades to accrue [37,38]. 
It is expected that REDD+ requirements will allow data 
provisions from several tiers in a single report. Highly 
variable and/or substantial carbon pools should be esti- 
mated using Tier 3 methodology (e.g. forest aboveground 
live carbon [ALC]), whilst Tier 1 or Tier 2 methodology 
may be sufficient for smaller carbon pools (e.g. CWD) or 
carbon poor land cover categories (e.g. bare ground). 

In Tier 3 methods, in order to extrapolate from plot 
data, it is necessary to develop correlations with re- 
motely sensed data to scale to the study area or country- 
wide estimates. Generally, carbon storage is either esti- 
mated via statistical correlation with electromagnetic 
properties, ground-truthed by volumetric measurements, 
such as diameter at breast height (DBH), which are con- 
verted to biomass estimates using allometric equations. 
A variety of remotely sensed data sources have been 
employed for carbon mapping and these can be aggre- 
gated into four groups: photographic imagery, RADAR, 
LiDAR, and ancillary geographic information systems 
(GIS) data (see Additional file 1: SI1 for an evaluation of 
each method). Here, we use ancillary GIS data as such 
data have three main advantages: 1) wide availability, often 
free of charge; 2) a suitable resolution (e.g. 90 m [39]); and 
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3) correlations with these ancillary GIS data may indicate 
which variables directly affect carbon storage. Developing 
an understanding of how these variables influence carbon 
storage is vital for accurate scenarios of future emissions. 

Here, we correlate carbon storage estimates from tree 
inventory plots (n = 1,611, median size = 0.1 ha) with 
data on climatic (e.g. temperature, precipitation, and 
solar radiation), edaphic (e.g. soil water holding capacity 
and soil fertility) and proxy variables for direct human 
interventions (e.g. governance type, distance from the 
main economic demand centres, population pressure, 
and historical logging), and variables that derive from 
climate-human interactions (e.g. burnt area index) for the 
Tanzanian watershed of the Eastern Arc Mountains (here- 
after, EAM [40]), which covers 33.9 million ha (Figure 1; 
see Swetnam et al (2011) [41] for further details). We de- 
velop Tier 3 type correlation equations to estimate the 



total ALC stored across the forested and wooded land 
cover categories, an advancement on previous Tier 2 esti- 
mates for the region presented in Willcock et al (2012) 
[42]. Additionally, we investigate the most influential cor- 
relates of spatial differences in carbon storage and how 
these result from changes in either species composition af- 
fecting wood density (specific gravity) or the number of 
large trees present. Lastly, a smaller number of inventory 
plots (n = 43, median size 0.1 ha) have two censuses, and 
by applying the same mapping procedures, we assess 
changes in carbon storage over time, providing a first- 
order estimate of sequestration across the region. 

Results 

Carbon stocks 

Utilising 1,611 plots and scaling to the 33.9 million ha 
study area we estimate that 1.32 (95% confidence 
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Figure 1 The Eastern Arc Mountains of Tanzania and Kenya [40]. The study area is the Eastern Arc watershed in Tanzania [41]. 
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interval [CI] ranges from 0.89 to 3.16) Pg C was stored 
in the aboveground live vegetation in the year 2000 
(Figure 2; Table 1). Woodland and bushland contrib- 
uted most to the amount of stored aboveground live 
carbon (ALC) in the study region, with open woodland 
storing the most ALC (0.49 [0.47 to 1.60] Pg C over 9.6 
million ha); followed by bushland (0.29 [0.15 to 0.51] 
Pg C over 5.0 million ha) and closed woodland (0.18 
[0.13 to 0.61] Pg C over 1.8 million ha). 

Best estimate values from our methodology, per unit 
area, in each land cover class, are given in Table 2. Forest 
contained the greatest ALC per unit area, with highest 
values in sub-montane forest (189 [95 to 588] Mg ha" 1 ), 
followed by lowland (182 [152- to 360] Mg ha" 1 ), upper 
montane (166 [69 to 533] Mg ha" 1 ), montane (130 [62 
to 702] Mg ha" 1 ), and forest mosaic (121 [55 to 485] 
Mg ha" 1 ). Woodlands held less ALC than forests, with 
closed woodland storing 100 (70 to 331) Mg ha" 1 and 
open woodland storing 51 (38 to 165) Mg ha" 1 (Table 2), 



but more than the landscape average of 39 (26 to 93) 
Mg ha" 1 . 

Our sequestration model suggests that the landscape 
may be losing 0.05 (-0.07 to 0.26) Pg C yr" 1 (mean net flux 
to atmosphere of 1.47 [-2.13 to 7.75] Mg C ha" 1 yr" 1 ). Of 
the 12.3 million ha of tree-dominated land in our study 
area, only 1.4% (0.17 million ha) shows a carbon decrease 
over the entire 95% CI range and only 0.8% (0.10 million 
ha) a definite carbon increase (Figure 3). The locations 
showing net carbon uptake are in the Udzungwa moun- 
tains, while the locations with net reductions in carbon 
storage are mainly in the Pare and Usambara mountains. 

Links between carbon stock and influential variables 

The variables that influence carbon storage and sequestra- 
tion may be inferred from relationships within the correl- 
ation models. Forward selection results are presented in 
the following paragraphs as these best indicate causal rela- 
tionships [43-45]. In general, backward models were in 




s 



Figure 2 Aboveground live carbon storage in the study area (a), with upper (b) and lower (c) pixel based 95% CI. See text for details 
on Methods. 
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Table 1 Aboveground live carbon stored within the study area for the year 2000, estimated by this and previous 
studies 



Study 



Aboveground 
live carbon, Pg 
(95% CI range) 



Methodology 



Resolution (m ) Disturbance included? 



Tanzanian 
on-the- 
ground data? 



Present study* - Tier 3 1 .32 (0.89-3.1 6) 



Correlation equations derived 100 
using remotely sensed 
influential variables. 



Willcock et al (201 2)* - 1 .58 (1 .56-1 .60) Land cover based look-up table. 1 00 
Original Tier 2 [42] 



1 .64 (1 .52-1 .76) Land cover based look-up table. 1 00 



Anthropogenic variables represent Yes 
human disturbance. Natural 
disturbance variables also 
included. 

Only where land cover categories Yes 
are identified as disturbed 
(e.g. cropland mosaics). 

Only where land cover categories Yes 
are identified as disturbed 
(e.g. cropland mosaics). 

Partially includes disturbance Yes 
through impacts on canopy 
heights. 

Partially includes disturbance No 
through impacts on canopy 
heights. 

Contains simple submodels of No 
natural plant mortality, disturbance 
from fire, and organic matter 
decomposition, as well as wood 
harvesting. 

Contains simple submodels of No 
natural plant mortality, disturbance 
from fire, and organic matter 
decomposition, as well as wood 
harvesting. 

Partially includes disturbance No 
through impacts on canopy 
heights. 

*This study and Willcock et al (2012) are not independent as they are derived from the same underlying data and utilise the same look-up table values. 



Willcock etal (2012) - 
Harmonised Tier 2 [42] 

Baccini et al (2012) - 
Tier 1 [3] 

Saatchi etal (201 1) - 
Tier 1 [4] 

Hurtt et al (2006) 
HYDE-SAGE - 
Tier 1 [46] 



2.03 



0.83 



0.63 



Hurtt et al (2006) HYDE - 0.41 
Tier 1 [46] 



Baccini et al (2008) - 0.34 
Tier 1 [47] 



Derived from MODIS and GLAS 500 
LiDAR data. 

Derived from MODIS, SRTM, 1 000 
QSCAT and GLAS LiDAR. 

Modelled from the Miami LU -11 0,000 
ecosystem model with cropland 
data from the Centre for 
Sustainability and the Global 
Environment. 

Modelled from the Miami LU ~1 10,000 
ecosystem model. 



Derived from MODIS and GLAS 1000 
LiDAR data. 



close agreement with forward models (Tables 3 and 4; 8700 km from a road (p-value < 0.010), and every 30,000 

Additional file 1: Tables SI -S3). units in the cost distance to Dar es Salaam (p-value < 

Carbon storage (adjusted R-squared [Adj R-sq] = 0.18) 0.010). Carbon storage decreased by 1 Mg ha" 1 for every 

is correlated positively with the natural logarithm of 1°C increase in mean annual monthly temperature range 

the population pressure with decay constant of 12.5 km (p-value < 0.001), every 2.7% rise in the total available 

(p-value < 0.001) and increased by 1 Mg ha" 1 for every water capacity of the soil (p-value < 0.001), and every 

Table 2 The mean (and 95% CI) estimates of forest characteristics investigated in this study (carbon storage, carbon 
sequestration, WSG, the intercept from the power law relationship and the gradient from the power law relationship) 
separated by land cover category 

Land cover category [41] Carbon storage Carbon sequestration WSG (g cm" 3 ) The intercept from the The gradient from the 

(Mg ha" 1 ) (Mg ha" 1 yr" 1 ) power law relationship power law relationship 



Lowland Forest (<1000 m) 


182 (152 to 360) 


-0.91 


(-7.08 to 4.29) 


0.60 (0.59 to 0.60) 


6.01 


(2.94 


to 


5.17) 


-0.93 (-1 


1 .04 to 


-0.82) 


Sub-montane forest 
(1000-1500 m) 


189 (95 to 588) 


-2.02 


(-11.06 to 1.29) 


0.58 (0.57 to 0.58) 


5.95 


(3.68 


to 


8.23) 


-1.31 (-1 


1 .48 to 


-1.14) 


Montane Forest 
(1500-2000 m) 


130 (62 to 702) 


-2.03 


(-11.85 to 1.07) 


0.60 (0.59 to 0.60) 


6.95 


(3.51 


to 


10.39) 


-1.57 (-1 


1 .82 to 


-1.32) 


Upper-montane forest 
(>2000 m) 


166 (69 to 533) 


-2.08 


(-10.49 to 1.23) 


0.60 (0.58 to 0.60) 


7.03 


(4.60 


to 


9.45) 


-1.61 (-1 


1 .93 to 


-1.26) 


Forest mosaic 


121 (55 to 485) 


-1.18 


(-6.69 to 2.92) 


0.56 (0.56 to 0.56) 


9.22 


(6.98 


to 


1 1 .46) 


-1.90 (-1 


1 .99 to 


-1.81) 


Closed Woodland 


100 (70 to 331) 


-1.24 


(-7.91 to 2.63) 


0.64 (06.2 to 0.65) 


6.67 


(4.95 


to 


8.60) 


-1.55 (-1 


1 .85 to 


-1.30) 


Open Woodland 


51 (38 to 165) 


-1.49 


(-7.53 to 2.05) 


0.61 (0.59 to 0.62) 


6.38 


(4.88 


to 


7.82) 


-1.45 (-1 


1 .70 to 


-1.19) 
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4.4 month increase in the mean number of dry 
months annually (p-value < 0.050). Carbon storage was 
2.1 Mg ha" 1 lower in areas where historical logging 
was present (p -value < 0.010), and 4.2 Mg ha" 1 higher in 
areas under the control of local communities/governments 
(p-value < 0.010). Thus, carbon storage is high in areas 
far from the commercial capital, with a low monthly 
temperature range, without a dry season, that have not 
suffered from historical logging and are under local 
community/government control (Figure 4; Table 3). 

The rate of carbon sequestration correlated with 
three principal component (PC) axes (presented in 
order of influence; Adj R-sq = 0.41). Carbon sequestra- 
tion was negatively correlated with the soil fertility axis 
(PC5; p-value < 0.050), warmer temperatures and longer 
dry seasons (PC3; p-value < 0.050), and with increased an- 
thropogenic disturbance (PCI; p-value < 0.010). Thus, car- 
bon sequestration was highest in less fertile areas with 



little or no drought and little anthropogenic disturbance 
(Table 4). 

Wood specific gravity (WSG; Adj R-sq = 0.28; see 
Additional file 1: SI2) was most strongly affected by the 
annual mean burned area probability (increasing by 
1 g cm' 3 for every 0.04 increase; p-value < 0.001) and 
the total available water capacity of the soil (decreasing 
by 1 g cm" 3 for every 82.0% increase; p-value < 0.001). 
Thus, WSG is higher in burnt areas with little available 
water (Additional file 2: Figure SI; Additional file 3: 
Figure S2; Additional file 1: Table SI). 

The intercept of the power law relationship (an indica- 
tion of potential stem density [see Additional file 1: SI3]; 
Adj R-sq = 0.30) was most affected by the natural loga- 
rithm of the population pressure with decay constant of 
12.5 km (positive correlation; p-value < 0.001) and the 
mean annual monthly temperature range (increasing by 
1.0 for every 1.2°C increase; p-value < 0.001). Thus, the 
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Table 3 The coefficients and associated p-values of the variables correlated with aboveground carbon storage using 
both forward and backward selection procedures 



Variable (where appropriate, units are given in brackets) 


Group 


Forward 




Backward 








Coefficient 


p-value 


Coefficient 


p-value 


(Intercept) 


n/a 


-1.21 E + 03 


3.14E-03 


-2.80E + 00 


7.55E-01 


Natural logarithm of the population pressure with decay constant of 12.5 km 


Anthropogenic 


1 .06E + 00 


1 .06E-05 


1 .42E + 00 


2.27E-06 


Natural logarithm of the population pressure with decay constant of 16.7 km 


Anthropogenic 


n/a 


n/a 


1 .42E + 00 


2.27E-06 


Distance to roads (km) 


Anthropogenic 


1.15E-04 


1 .09E-03 


1 .78E-04 


1.30E-05 


Historical logging - Partially logged (no logging/partially logged) 


Anthropogenic 


-2.10E + 00 


1 .09E-03 


-3.83E + 00 


4.97E-07 


Cost distance to Dar es Salaam 


Anthropogenic 


3.41 E-05 


2.00E-03 


2.58E + 00 


5.46E-03 


Natural logarithm of the cost distance to market towns 


Anthropogenic 


-6.05 E-01 


5.24E-02 


-9.85E-01 


1.89E-02 


Governance - local (national/local/joint/unknown) 


Anthropogenic 


4.24E + 00 


9.29E-03 


n/a 


n/a 


Governance - national (national/local/joint/unknown) 


Anthropogenic 


-7.95 E-03 


9.78E-01 


n/a 


n/a 


Governance - unknown (national/local/joint/unknown) 


Anthropogenic 


6.26E-01 


7.10E-01 


n/a 


n/a 


Mean annual monthly temperature range (°C) 


Climatic 


-9.79E-01 


2.00E-16 


-1.15E + 00 


1.98E-13 


Mean annual minimum monthly temperature (°C) 


Climatic 


n/a 


n/a 


1 .09E + 00 


3.07E-16 


Mean annual maximum monthly temperature (°C) 


Climatic 


n/a 


n/a 


-1.15E + 00 


1.98E-13 


Mean number of dry months annually 


Climatic 


-2.28E-01 


2.57E-02 


-3.09E-01 


5.58E-03 


Total available water capacity of the soil 

(vol. %, -33 to -1500 kPA conforming to USDA standards) 


Edaphic 


-3.75E-01 


1.16E-05 


-8.59E-01 


3.05 E-05 


Total nitrogen content of the soil (g kg" 1 ) 


Edaphic 


n/a 


n/a 


-4.13E-01 


2.50E-03 


Total carbon content of the soil (g kg" 1 ) 


Edaphic 


n/a 


n/a 


6.18E + 00 


1.15E-03 


pH of the soil (pH) 


Edaphic 


n/a 


n/a 


1.73E + 00 


2.96E-02 


Spatial autocorrelation term 5 


Spatial 


6.45E + 01 


3.15E-03 


6.60E + 00 


1.18E-01 


Spatial autocorrelation term 7 


Spatial 


-8.48E-01 


3.57E-03 


-1.71 E-01 


1.45 E-01 


Spatial autocorrelation term 4 


Spatial 


n/a 


n/a 


6.60E + 00 


1.18E-01 


Spatial autocorrelation term 3 


Spatial 


n/a 


n/a 


-1.71 E-01 


1.45 E-01 



density of smaller stems increases in areas with a high 
population pressure and large temperature fluctuations 
(Additional file 3: Figure S2; Additional file 4: Figure S3; 
Additional file 1: Table S2). 

Correlations identified for the gradient of the power 
law relationship (an indication of the proportion of 
larger stems; see Additional file 1: SI3) were broadly 
the inverse of those identified for the intercept. The 
gradient of the power law relationship was most af- 
fected by the natural logarithm of the population pres- 
sure with decay constant of 20.8 km (negative correlation; 
p-value < 0.001) and the mean burned area probability in 



Table 4 The coefficients and associated p-values of the 
variables correlated with aboveground carbon 
sequestration 



Variable 


Coefficient 


p-value 


(Intercept) 


0.032 


0.890 


PCI 


-0.112 


0.006 


PC3 


-0.255 


0.010 


PC5 


-0.412 


0.012 



the fourth quarter (decreasing by 1.0 for every 0.2 in- 
crease; p-value < 0.001). Thus, the proportion of large 
stems was greater in areas experiencing few distur- 
bances from people or fire (Additional file 3: Figure S2; 
Additional file 5: Figure S4; Additional file 1: Table S3). 

When investigating the most influential correlates of 
spatial differences in carbon storage and how these re- 
sult from changes in either species composition affecting 
wood density (specific gravity) or the number of large 
trees present, we found that the final Tier 3 carbon stor- 
age estimates were positively correlated with both size- 
frequency distribution estimates (both intercept and gra- 
dient [p-values < 0.001]), and negatively correlated with 
WSG estimates (p- value < 0.001) and maximum height 
estimates (p-value < 0.001; Additional file 1: see SI4). All 
possible interactions were investigated and were signifi- 
cant (Adj R-sq = 0.35; p- values < 0.001), however, the 
majority of the explanatory power lay within the second 
order interactions (Adj R-sq = 0.33; p-values < 0.001; 
Additional file 1: Table S5). Broadly, WSG and the pro- 
portion of larger stems had largest influence over the 
carbon storage estimate. Considering only second order 
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(e) Historical logging 



(d) Governance 



(c) Population pressure 




No logging Historical logging Partially logged 

(b) Temperature range 



Governance 

(a) Number of dry months 



Natural logarithm of the population pressure (decay constant of 15km) 

(f) Total available water capacity 




i ii m i mm\ ii 



Mean annual monthty^fflperature r 



Mean number of dry months annually 



Total available watef^dapacity of the soil (%) 



Figure 4 The modelled effect of most influential, significant anthropogenic (a, b, and c), climatic (d and e) and edaphic (f) variables of 
aboveground live carbon storage. Dashed red lines indicate the modelled 95% CI. The data is indicated by black lines above the x-axis. 



interactions, in areas of low potential stem density, car- 
bon storage is positively correlated with maximum can- 
opy height (Additional file 6: Figure S5). However, the 
opposite correlation is observed in areas of higher stem 
density. Although similar interactions are observed be- 
tween both size-frequency distribution estimates (gradi- 
ent and intercept), the interaction between WSG and 
maximum canopy height is inverse, with carbon storage 
only showing positive correlations with maximum can- 
opy height in areas of high WSG. Both size-frequency 
distribution estimates also interacted similarly with 
WSG, with both showing positive correlations with car- 
bon storage in areas of low WSG, but negative correla- 
tions in areas of high WSG (Additional file 6: Figure S5). 
Finally carbon sequestration correlation values were posi- 
tively correlated with carbon storage estimates (p-value < 
0.001), indicating that areas storing the most carbon are 
also those that are increasing in stock at the fastest rate. 

Discussion 

Tier 3 correlation-based method vs. Tier 1 and 2 methods 

Our estimates of 1.3 Pg C stored across the 33.9 million 
hectares is larger than most previous Tier 1 estimates 
[46-48], although below the most recently produced esti- 
mate [3] (Table 1). Underestimation of the amount of 



carbon stored in the EAM region in global analyses can 
be a result of their poor resolution and/or application of 
data from other regions which may differ systematically 
compared to East African forests, woodlands and sa- 
vannas [42] . When separated by land cover category, our 
locally derived carbon estimates are comparable to those 
presented in other local [49-52] and global studies, the 
latter often containing little or no data from East Africa 
[3,4,46,47,53]. This suggests differences between our es- 
timates and other studies have arisen because many 
previous studies mapped carbon storage at lower reso- 
lution [3,4,46,47,53]. When considering homogenous 
landscapes, scale effects are unlikely to cause a dra- 
matic difference in carbon estimates. However, in highly 
fragmented and heterogeneous landscapes, such as East 
Africa, the effects of scale are likely to be substantial. For- 
est fragments, typically of high carbon storage, may be 
omitted at lower resolutions, being replaced' by more 
dominant, but low carbon, land cover categories (e.g. open 
woodland), resulting in underestimation of carbon storage. 

It must be noted that, the landscape-scale confidence in- 
tervals surrounding our Tier 3 estimates are considerably 
wider than those around previous estimates [3,4,42,47,53]. 
This result is consistent with Hill et al (2013), who also 
showed increasing methodological sophistication does not 
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necessarily result in reduced uncertainty, as is often as- 
sumed [54]. Confidence intervals derived from look-up 
table values may show a systematic bias. The ranges pro- 
vided are an artefact of the study area, the number of land 
cover categories and the resolution, as when summed 
across a large number of pixels, pixel error is mostly ne- 
gated as underestimates in one part of the landscape are 
counterbalanced by overestimates in other parts. The 95% 
CI developed from correlation equations are effectively 
based on numerous continuous variables, containing the 
uncertainty relating to anthropogenic, climatic and ed- 
aphic variables, thus have many thousands of possible 
combinations, severely limiting the ability of the law of 
averages' to act. Hence, the 95% CI presented in this in- 
vestigation may better reflect that of the actual landscape, 
containing more variables that make-up the complex 
landscape heterogeneity (i.e. improved representativeness), 
although this is only true for those pixels estimated using 
the correlation equations (86% of the EAM but only 52% 
of the study area). Therefore, the look-up table 95% CI 
presented in Willcock et al (2012), and used in this study, 
may underestimate uncertainty [42]. Future studies should 
expand the existing plot network (Figure 1), enabling the 
correlation equations (and improved 95% CI) to be ap- 
plied to the entire study area. This process has already 
begun under a new WWF-REDD+ project (which focusses 
on better sampling the data-deficient land cover categories 
identified in this study [55]) and the National Forest Mon- 
itoring and Assessment (NAFORMA) project [56,57]. 

Links between carbon stock and influential variables 

The results presented here indicate that ALC storage in 
tree-dominated ecosystems is correlated with anthropo- 
genic, climatic and edaphic variables. However, in all our 
models there is a large amount of unexplained variation 
(R-squared values for our correlation models vary be- 
tween 0.18 and 0.41). This is likely to be due to three 
main reasons (Additional file 1: SI6). Firstly, although we 
used the highest resolution datasets that are freely avail- 
able, several of the associated variables are of relatively 
poor resolution across the EAM (including; wind, light 
and soil nutrient variables [Additional file 1: Table S6]). 
This is particularly important here as low resolution GIS 
data is unlikely to correlate well with the response vari- 
ables from our plot network as many plots (with high 
variance [58]) may fall within a single cell [59]. Thus, 
our study may be biased against retaining low resolution 
explanatory variables in our models. Secondly, contem- 
porary forest characteristics are the result of growth, re- 
cruitment and mortality over many years. It is difficult 
to obtain data on historical variables and yet these could 
have had a significant impact on present day carbon 
storage and other forest characteristics [60]. Thirdly, 
present day information is also lacking, for example 



datasets describing physical soil properties in the study 
area are unavailable. Thus, future work is needed to de- 
velop additional high resolution GIS data, particularly 
for historic time periods. 

Of the variance explained in our forward and back- 
ward models, direct anthropogenic factors are the most 
influential explanatory variables (as noted by the largest 
coefficient of explanatory variables on the response vari- 
able, in contrast to those [e.g. temperature] with smaller 
p-values but also smaller influence [Table 3]) and so are 
the focus of our remaining discussion (see Additional 
file 1: SI5 for discussion of climatic and edaphic variables). 

Within our study area, people are clustered around 
high carbon areas (Figure 4). We suggest this could be 
due to these areas having favourable climatic conditions 
with more moisture for plant (and thus crop) growth. 
Further, the incidence of malaria is lower at high eleva- 
tions [61], making these locations more habitable for hu- 
man populations. Thus there is a peak in population 
density near the base of high-carbon montane forests 
[40]. Our interpretation that it is the landscape suitabil- 
ity driving human population density is consistent with 
the observation that when individual localities are 
followed over time, degradation at the local level caused 
by the population is evident [62,63]. This emphasises 
that our results are not proof of causation and that the 
drivers may be a correlate of the explanatory variables 
retained in our models (Additional file 1: SI6). Our re- 
sults also show a decrease in carbon storage in previ- 
ously logged areas and in areas nearer the commercial 
capital, Dar es Salaam. This confirms previous reports 
that areas near the capital have lower biomass due to 
the local demand of low grade timber by the city, as well 
as international demand for high grade timber via the 
city's port [62]; emphasising the connections between 
the rural and urban landscape, and how the sphere of 
urban influence drives change in rural ecosystems. Fu- 
ture investigations should use simulation modelling and 
direct experimentation to identify if the influential vari- 
ables highlighted here can be confirmed as drivers of 
carbon storage and sequestration, providing a deeper un- 
derstanding of the process-based relationships. 

The decrease in carbon storage as a result of logging 
(51-77% of the ALC is retained) is of similar magnitude 
to other reported estimates [64]. However, the historical 
logging data we utilised was based on expert opinion 
(Additional file 1: Table S6) so, given its importance, fur- 
ther work developing and evaluating historical variables 
is needed (Additional file 1: Table S7). We observe a 
comparable decrease due to differing governance. Land 
under national control holds between 40% and 65% of 
the ALC stored in areas under decentralised governance. 
This perhaps indicates that decentralisation of manage- 
ment (e.g. participatory and community led forestry) is 



Willcock et al. Carbon Balance and Management 2014, 9:2 
http://www.cbmjournal.eom/content/9/1/2 



Page 10 of 17 



successful in our study area [37,65]. However, it is not 
possible to prove causation within the framework of this 
study. Many locally managed forests are located in the 
south-east of our study area within an area of naturally 
high carbon storage, whereas land under national con- 
trol covers much larger areas, including the dry, carbon- 
poor east. Hence, our finding that carbon storage is 
higher in areas under decentralised control may be an 
artefact of the differing areas where this type of land 
management occurs. Further studies monitoring change 
in carbon storage over time under the two different gov- 
ernance regimes would enable the effect of land manage- 
ment to be determined. 

The overall effects on carbon storage are a result of 
many changes in forest characteristics. Both WSG and 
the proportion of larger stems decrease with increasing 
anthropogenic disturbance, however, stem density (> = 
10 cm DBH) increases. Anthropogenic disturbance, for 
example logging, is often a commercial activity and re- 
sults in the preferential removal of the largest, most 
valuable stems [62]. The more open canopy, following 
stem removal, would result in increased recruitment 
from young forest trees [66], leading to the high num- 
bers of small stems observed. However, the opposite 
would be expected in woodlands and savannas, with 
more open canopies resulting in more grass, high fire in- 
tensity and so less recruitment [67,68]. Our results high- 
light how influential the negative effect of people on 
tropical forest carbon storage can be. This assertion is 
supported by data from across the tropics [69-71]. The 
significant impact of anthropogenic activities implies 
that REDD+ could, at the local scale, have significant 
positive impacts on carbon storage. However, careful 
policy designs to limit leakage of deforestation and en- 
courage the involvement of the local population are 
needed to ensure REDD+ schemes achieve their carbon 
storage and sequestration aims [72]. 

Like carbon storage and its components, carbon se- 
questration is also correlated with anthropogenic, cli- 
matic and edaphic variables. We estimate that some 
localities (for example the Udzungwa Mountains Na- 
tional Park; Figure 4) provide a carbon sink of compar- 
able per-area magnitude to modelled estimates in East 
Africa [73] and to that observed over recent decades in 
structurally intact African forest [7]. However, many 
areas of forest and woodland within the study area ex- 
perience a high level of degradation and disturbance, 
and so are net sources. Here, we have shown that an- 
thropogenic disturbance is a key determinant of the 
trend in carbon storage over time in eastern Tanzania. 
Important locations of high carbon losses are the Pare 
and Usambara mountains (Table 5), which historically 
have seen the highest rates of degradation and disturbance 
[74]. The national population of Tanzania is increasing 



Table 5 Carbon stored and sequestered across the 
individual mountain blocks of the EAM range (the total is 
denoted in bold) 

Eastern Arc Area, km 2 Aboveground live Mean carbon 

Mountain carbon storage, Tg sequestration, 

Block [40] Tier3 Willcock et al M 9 ha " yr 1 



(2012) - Original 
Tier 2 [42] 



North Pare 


510 


1.93 


2.38 


2.60 


South Pare 


2,327 


8.96 


9.59 


2.41 


West 
Usambara 


2,945 


13.52 


15.96 


3.64 


East 

Usambara 


1,145 


5.91 


7.63 


2.79 


Nguu 


1,562 


9.34 


12.71 


1.89 


Nguru 


2,565 


15.11 


18.86 


1.79 


Ukaguru 


3,243 


13.39 


20.63 


1.42 


Uluguru 


3,057 


15.92 


13.91 


1.35 


Rubeho 


7,984 


36.84 


40.96 


1.06 


Malundwe 


33 


0.29 


0.29 


1.80 


Udzungwa 


22,788 


101.73 


104.05 


1.01 


Mahenge 


2,606 


23.58 


12.08 


0.19 


Total 


50,765 


246.53 


259.06 


1.19 



[75] and this may increase the pressure on tree-dominated 
ecosystems which could result in the study area becoming 
a significant source of carbon in the future. Furthermore, 
the effect of increase in anthropogenic pressures could be 
compounded by potential decrease in carbon storage as a 
result of increasing temperatures [76,77] and changes in 
soil nutrients (see Additional file 1: SI5). However, these 
future effects could be complicated by increasing levels of 
atmospheric C0 2 , varying effectiveness of legally pro- 
tected areas and shifting consumption patterns. 

Conclusions 

Our results show that the amount of carbon stored in 
forests across 33.9 million ha of the Eastern Arc Moun- 
tains of Tanzania is considerable: 1.32 (0.89 to 3.16) Pg. 
Our estimate is significantly higher than most previous 
estimates. However, our more sophisticated method also 
has higher uncertainty, implying that other methods 
may substantially underestimate the uncertainty in- 
volved. Within the tree-dominated land cover categories, 
historical logging is the most influential direct anthropo- 
genic factor, while the mean number of dry months is 
the most influential environmental factor, with an order 
of magnitude less impact on carbon storage. We show 
that WSG, size-frequency distribution variables and 
height variables are all important in determining carbon 
storage. Our estimates indicate that, between 2004 and 
2008, tree-dominated communities across the study 
areas showed no significant change, however some areas 
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were identified as large sinks (0.8% of the study area) 
and others large sources (1.4% of the study area), show- 
ing the importance of taking a landscape scale approach. 
The carbon maps produced and statistical relationships 
documented can assist policy-makers in designing pol- 
icies to maintain and enhance carbon storage for climate 
mitigation and other ecosystem services. 

Method 

We collated data from 2,462 tree inventory plots within 
our study area (see Additional file 1: SI3), then applied a 
quality control and standardisation protocol. This con- 
sists of two main steps: (1) Metadata quality control; and 
(2) Measurement bias detection. 

Firstly, all plots lacking a recorded spatial location and 
a fixed area were discarded (770 plots). Plots where one 
or more diameter at breast height (DBH) data were 
known to be missing were also excluded (7 plots). Fur- 
thermore, plots smaller than 0.025 ha (16 plots) were 
deemed to produce unreliable carbon estimates so also 
removed from the dataset. 

Secondly, to assess possible measurement bias, i.e. not 
measuring over buttresses and so overestimating biomass 
[78], the remaining plots were grouped by the lead field 
researcher. Size-frequency distributions, using 10 cm size 
classes, were created for each of these groups. Forest size- 
frequency distributions are suggested to conform to the -2 
power law based on metabolic scaling [79]. Although it 
has been argued that this rule is not globally applicable 
[80], many studies accept this as a theoretical maximum 
value for the abundance of large stems [81]. Thus, re- 
searchers with many plots above this maximum value 
likely measured stems around buttresses and so were re- 
moved (1 researcher, 100 Plots). 

The quality control and standardisation procedure re- 
sulted in a dataset of 1,611 tree inventory plots (median 
0.1 ha, mean 0.1 ha, mode 0.1 ha [43 plots with mul- 
tiple censuses; median 0.1 ha, mean 0.5 ha, mode 
1.0 ha]; Figure 1; see Additional file 1: SI3 for a further 
information) from which we calculated plot-level stand 
structure indices and aboveground carbon storage per 
unit area (see Additional file 1: SI2 for full details). We ob- 
tained the exponent and intercept of the population size- 
frequency distribution using the power law fit for each 
plot using the log-log transformation method. Whereby, 
for each plot, we created 10 cm bin size-frequency distri- 
butions based on DBH, and a linear model of the loga- 
rithm of frequency against the logarithm of the size class 
was fitted. Whilst not as accurate as the maximum likeli- 
hood estimation method, our simpler method is more 
stable for many of our plots, providing both the intercept 
and slope indicators of population structure [82]. 

We obtained WSG data via the phylogenetic informa- 
tion provided by our tree inventory plots. We used a 



global wood density database to extract species average 
WSG [83]. This procedure provided over 32,000 trees 
with WSG data. When this was not possible we 
adopted a hierarchical approach, first applying the ap- 
propriate genus average if available (-14,000 trees) be- 
fore considering family average (-9,500 trees), plot 
average (-4,500 trees) and dataset average (-80 trees) 
in turn [84]. Including WSG as an additional parameter 
in allometric equations reduces the biomass estimation 
error [49,85,86]. 

In addition, we estimated plot biomass using moist 
forest tree allometry [86] based on measurements of 
DBH from our tree inventory plots, WSG (as described 
above) and height data (derived from our dataset using 
the best fit DBH-height equation form [Equation 5.1; see 
Additional file 1: SI4], if not measured in the tree inven- 
tory plots). Finally, carbon was assumed to be 50% of 
biomass [7]. 

For a smaller number of plots, multiple measurements 
were available over time (n = 43; mean plot size = 0.5 ha; 
mean measurement period = 3.9 years). We calculated 
changes in carbon storage rates by dividing the differ- 
ence in carbon storage estimates between censuses by 
the number of years separating them. 

For our 1,611 geo-referenced tree inventory plots, we 
obtained further information on variables falling into five 
broad categories; anthropogenic, climatic, geographic, 
edaphic, and pyrologic (median resolution 1.0 ha, mean 
resolution 22.0 ha, mode resolution 1.0 ha; Additional 
file 1: Table S6). Anthropogenic data, further divided 
into six subcategories, were obtained: (1) population 
pressure variables (n = 14 related variables) were ob- 
tained from Platts (2012) [87] (see Additional file 1: SI7); 
(2) Dar es Salaam related variables (n = 3; e.g. distance to 
Dar es Salaam), (3) market town related variables (n = 3; 
e.g. distance to market towns), and (4) infrastructure re- 
lated variables (n = 2; e.g. distance to roads) were derived 
from available topographic maps; (5) historical logging 
(n = 1) from Swetnam et al (2011) [88]; and (6) govern- 
ance (n = 1) from the World Database on Protected 
Areas [89]. Climate data were divided into three subcat- 
egories (precipitation [n = 2; maximum mean cumulative 
water deficit and mean number of dry months annu- 
ally], temperature [n = 4; mean annual temperature, 
mean annual minimum monthly temperature, mean an- 
nual monthly maximum temperature, and mean annual 
monthly temperature range] and wind speed [n = l]) 
and were derived from the Tropical Rainfall Measuring 
Mission [90,91], WorldClim [92,93], and United States 
National Aeronautics and Space Administration Surface 
meteorology and Solar Energy [94] datasets. Similarly, 
geographic data have two variables (aspect [n = 1] and in- 
coming solar radiation [n = 1]) derived from Shuttle Radar 
Topography Mission [93] and National Renewable Energy 
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Laboratory [95,96] datasets respectively. Lastly, we ex- 
tracted edaphic data (n = 6) from the International Soil 
Reference and Information Centre database [97,98] and 
fire-related variables (n = 5) derived from MODIS im- 
ages [99]. 

We then correlated these variables with carbon stor- 
age, and following this, its components: WSG, the inter- 
cept of the power law relationship, and the gradient of 
the power law relationship, in each case using general 
linear models (see Additional file 1: SI2-5). No transfor- 
mations were required to ensure a normal distribution 
when correlating either WSG, the intercept of the power 
law relationship or the gradient of the power law rela- 
tionship with the individual variables. However, carbon 
storage estimates required a square root transformation 
to ensure a normal distribution within the general linear 
models (normality was confirmed using the Shapiro- 
Wilk test; p-value > 0.05). In all models, plots were 
weighted by the square root of their area as confidence 
in biomass estimation increases with the area surveyed 
[100,101]. Landscape scale spatial autocorrelation was 
accounted for by including spatial terms (latitude, longi- 
tude and the interactions between them) in the model 
(Additional file 1: Table S6) [102]. The numerous pos- 
sible interactions were excluded from the models, as 
these were found to add very little explanatory power to 
the models, only increasing R-squared values by -0.001 
with the addition of each interaction term. All analyses 
were performed using R 2.12.1 [103] and mapped in 
ArcGIS v9.3.1 [104]. 

When assessing carbon sequestration (n = 43) fewer 
degrees of freedom were available, therefore explana- 
tory variables need to be grouped. Therefore, we con- 
ducted a principle components (PC) analysis, obtaining 
five PC which explained >90% of the cumulative vari- 
ance of the individual influential variables (Additional 
file 1: Table S4). Then, covariation of PC with carbon 
sequestration was assessed instead of the individual in- 
fluential variables. Carbon sequestration estimates re- 
quired a cube-root transformation to ensure a normal 
distribution within the general linear models (con- 
firmed using the Shapiro- Wilk test; p-value > 0.05). 
This enabled the effect of multiple variables to be ex- 
amined even with this limited dataset. PC analysis of 
the variables was performed on the scaled data using 
the prcomp package [105] within R 2.12.1 [103], All 
other aspects of the model (weighting and spatial auto- 
correlation) were performed identically to the models 
for carbon storage and its components. 

The most appropriate model was chosen using for- 
ward and backward stepwise selection. Forward models 
are more useful for inferring causal relationships [43] 
and so were preferentially used to infer the influential 
variables of carbon storage and sequestration. However, 



averaging forward- backwards and backward-forwards 
predictions outperforms conventional selection proce- 
dures [43] and so both methods were used when esti- 
mating the spatial distributions within the study area. 
Akaike information criterion (AIC) was used to reduce/ 
expand the models, with variable selection occurring 
when the variable reduced the mean squared error 
(MSE) under ten-fold cross validation [106]. Unlike 
model selection using R-squared, which neglects the 
principles of parsimony, AIC considers both model fit 
and complexity, resulting in better predictions and 
allowing inferences to be made from multiple models 
[107]. Model selection continued until the addition/re- 
moval of further variables able to reduce cross validation 
MSE no longer increased AIC, thereby producing the 
best- fit model with the lowest prediction error [43] . 

Within each category (anthropogenic, climatic, geo- 
graphic, edaphic, and pyrologic), some variables were 
highly correlated (Additional file 1: Table S7) and this 
may confound the stepwise procedure as each variable 
does not carry enough distinct information [108]. For 
example, all temperature related variables (Additional 
file 1: Table S7) were correlated (R-squared > 0.6). How- 
ever, it is unclear which correlated best with the vari- 
ables of interest, e.g. carbon storage and sequestration. 
Many studies include mean annual temperature in bio- 
mass models [77,109], but theory suggests that it may be 
the temperature range driving this relationship as photo- 
synthesis correlates with maximum temperatures, but 
respiration with minimum temperatures [76,110,111]. 
We found that, if we removed correlated variables prior 
to model selection, the final models were artefacts of the 
variables we had selected. For example, if we included mean 
annual temperature in the model, but not temperature 
range, then the significant correlations between mean an- 
nual temperature and ALC storage were found. However, 
these correlations were insignificant if temperature range 
was added to the model, with the newly added variable 
showing a significant effect instead. In short, the resultant 
models were automatically biased towards a priori expecta- 
tions. To avoid this bias, we devised a procedure by which 
the influential variables included in model selection were 
selected by their ability to explain variation within the data 
of interest (e.g. carbon storage). All variables (describe 
above) were included in model selection. Once this had run 
to completion the model was assessed. The subcategory 
with the most correlated variables retained within the 
model was selected and all but the most influential, sig- 
nificant variable were removed. For example, if all four 
temperature-related variables were included in the ini- 
tial model and this was the largest group of variables 
then this group would be selected. Then, if mean annual 
temperature was the most influential and significant 
temperature-related variable, all other temperature-related 
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variables would be excluded in the next round of model 
selection. Thus, stepwise model selection was then re- 
peated for all remaining variables. This process was re- 
peated until no highly correlated variables remained 
within the model produced. 

Since only landscape-scale variation was accounted for 
by the spatial terms already included in the model (lati- 
tude, longitude and the interactions between them; 
Table 1; Additional file 1: Table S6), it was necessary to 
investigate the effect of local-scale (<10 km 2 ) spatial 
autocorrelation [102]. To do this, the separate forward 
and backward models, containing no highly correlated 
variables (produced above), were mapped. Then, the 
sum of the model estimates within the maps were ex- 
tracted at 1, 3, 5, 7 and 10 km 2 resolutions, and included 
as additional variables (representing local spatial auto- 
correlation terms) into the stepwise model selection 
process, which was re-run a final time [112]. However, 
in all cases, local spatial autocorrelation terms were 
rejected as they did not reduce cross validated MSE. 

Since it was not necessary to include local spatial auto- 
correlation terms in the models, the preliminary maps 
produced above could be regarded as final spatial repre- 
sentations of the ten best fit models, two (forward and 
backward) for each of the five variables of interest (car- 
bon storage, carbon sequestration, WSG, the intercept 
of the power law relationship and the gradient of the 
power law relationship). Each pair of maps (forward and 
backward) were then combined into a single, final 
weighted mean estimate. The ratio of the relevant cross 
validated MSE of the forward and backward models was 
used to create the weighted mean, with the model show- 
ing lowest error receiving the highest weighting [43]. 
Thus, we ultimately produced five maps (from ten best 
fit models); one each for carbon storage, carbon seques- 
tration, WSG, the intercept of the power law relation- 
ship, and the gradient of the power law relationship. As 
our carbon storage estimates were derived from data 
representing trees with a DBH greater than or equal to 
10 cm, regionally estimates of ratios from Willcock et al 
(2012) were used to estimate the unmeasured compo- 
nent of ALC storage [42], this was summed with our 
modelled carbon storage estimate, providing an estimate 
of total ALC storage. 

Although the five maps produced covered the entire 
study area, we were concerned that extrapolating predic- 
tions beyond the range of observed predictor variables 
from our dataset could result in large, unquantifiable er- 
rors. Thus, we limited the models to localities where all 
the associate variables were within the range of that 
shown in our dataset, thus only interpolating within our 
correlation models for tree-dominated land cover cat- 
egories. For any pixels outside the data range, look-up 
table methods were used in preference to the correlation 



model estimates. Thus, for every land cover in our study 
area containing trees (open woodland; closed woodland; 
forest mosaic; lowland forest; sub-montane forest; mon- 
tane forest; and upper montane forest [41]) that fell 
within the limits of our dataset, the estimate of carbon 
storage derived from the correlation equations was used. 
For all other land cover categories, and for those local- 
ities for which predictor variables fell outside the ranges 
of values used in model construction, land cover based 
look-up table values from Willcock et al (2012) were 
used to estimate ALC storage [42]. In total, look-up table 
values were applied to 52% of the landscape, although 
this was predominantly to low carbon land cover cat- 
egories, with 86% of the EAM (which hold the majority 
of the regions tropical forest [113]) estimated using the 
correlation approach described above. Estimates of WSG 
and population structure were only made for wooded 
land cover categories, with estimates for areas within 
our dataset range being derived from the relevant correl- 
ation equations and estimates for other areas coming 
from land cover based look-up table values derived from 
the median value of our WSG and population structure 
data (weighted by the square root of plot size and de- 
rived via sampling with replacement 10,000 times) for 
each land cover category (Additional file 1: Table S8). 
For carbon sequestration, again, estimates were only 
made for wooded land cover categories for those areas 
inside the range of our dataset estimates derived from 
the correlation equations were used. However, unlike 
carbon storage, WSG and population structure, for areas 
outside the range of our dataset, a land cover based 
look-up table was not used as several land cover categor- 
ies were poorly represented due to the small sample size 
available (n = 43). Instead, for pixels outside the range of 
the correlation-derived carbon sequestration model (16% 
of pixels with wooded land cover), the median value of 
data from our recensused plots (again weighted by the 
square root of plot size and derived via sampling with 
replacement 10,000 times) was utilised. 

For every 1 ha pixel of each map derived from correl- 
ation equations, we produced 95% confidence intervals 
(CI). If the pixel estimate was derived from the general 
linear models, then the pixel 95% CI was calculated by 
adding and subtracting the square root of the cross val- 
idation MSE. For look-up table pixels the look up table 
95% CI were used. The pixel 95% CI describes, for every 
pixel, the range we would expect each of our estimates 
to lie within. However, as we are also interested in esti- 
mating carbon storage and sequestration on a landscape 
scale, indications of uncertainty are also required at 
landscape-scale. Simply summing the pixel 95% CI to 
derive 95% CI of the overall landscape-scale estimates 
would incorrectly treat random error as a region-wide 
systematic bias. Thus, to derive 95% CI for landscape- 
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scale estimates, we randomly allocated each pixel an es- 
timate within the range dictated by its 95% pixel CI, and 
summed these values across the entire landscape. This 
process was performed 10,000 times and the median 
value and 95% CI (the 250 th and 9,750 th ranked values, 
which may not be equally distributed around the me- 
dian) for aboveground carbon storage and sequestration 
in the study area were obtained. 

For the final model of carbon storage estimates, we inves- 
tigated how the components of carbon storage (population 
structure, WSG and tree height) interacted to ultimately 
produce the ecosystem service of carbon storage. We ob- 
tained estimates of maximum canopy height from the best 
fit DBH-height equation [Equation 5.1; see Additional file 1: 
SI4 and Additional file 7: Figure S6], and combined this 
spatially with our correlation model derived estimates of 
WSG, the intercept of the power law relationship and gradi- 
ent of the power law relationship. We then correlated these 
against our estimates of carbon storage, allowing all possible 
interactions, and selected the best-fit model (via AIC) using 
both forwards and backwards stepwise regression. 

Ethical approval for the above study was obtained 
from the Faculty of Environment Research Ethics Com- 
mittee, in accordance with the University of Leeds re- 
search ethics policy. 
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